Resource allocation is an important issue in cognitive radio systems. It canbe done by carrying out negotiation among secondary users. However, significantoverhead may be incurred by the negotiation since the negotiation needs to bedone frequently due to the rapid change of primary users' activity. In thispaper, a channel selection scheme without negotiation is considered formulti-user and multi-channel cognitive radio systems. To avoid collisionincurred by non-coordination, each user secondary learns how to select channelsaccording to its experience. Multi-agent reinforcement leaning (MARL) isapplied in the framework of Q-learning by considering the opponent secondaryusers as a part of the environment. The dynamics of the Q-learning areillustrated using Metrick-Polak plot. A rigorous proof of the convergence ofQ-learning is provided via the similarity between the Q-learning andRobinson-Monro algorithm, as well as the analysis of convergence of thecorresponding ordinary differential equation (via Lyapunov function). Examplesare illustrated and the performance of learning is evaluated by numericalsimulations.
展开▼